Exploration and Exploitation in Parkinson’s Disease: Computational Analyses
Authors
Affiliations
Björn Meder
Health and Medical University, Potsdam, Germany
Martha Sterf
Medical School Berlin, Berlin, Germany
Charley M. Wu
University of Tübingen, Tübingen, Germany
Matthias Guggenmos
Health and Medical University, Potsdam, Germany
Published
August 2, 2025
Code
# Housekeeping: Load packages and helper functions# Housekeepingknitr::opts_chunk$set(echo =TRUE)knitr::opts_chunk$set(message =FALSE)knitr::opts_chunk$set(warning =FALSE)knitr::opts_chunk$set(fig.align='center')options(knitr.kable.NA ='')packages <-c('gridExtra', 'BayesFactor', 'tidyverse', "RColorBrewer", "lme4", "sjPlot", "lsr", "brms", "kableExtra", "afex", "emmeans", "viridis", "ggpubr", "hms", "scales", "cowplot", "waffle", "ggthemes", "parameters", "rstatix", "magick", "grid")installed <- packages %in%rownames(installed.packages())if (any(!installed)) {install.packages(packages[!installed])}# Load all packageslapply(packages, require, character.only =TRUE)set.seed(0815)# file with various statistical functions, among other things it provides tests for Bayes Factors (BFs)source('statisticalTests.R')# Wrapper for brm models such that it saves the full model the first time it is run, otherwise it loads it from diskrun_model <-function(expr, modelName, path='brm', reuse =TRUE) { path <-paste0(path,'/', modelName, ".brm")if (reuse) { fit <-suppressWarnings(try(readRDS(path), silent =TRUE)) }if (is(fit, "try-error")) { fit <-eval(expr)saveRDS(fit, file = path) } fit}# Setting some plotting paramsw_box <-0.2# width of boxplot, also used for jittering points and lines line_jitter <- w_box /2xAnnotate <--0.3# jitter paramsjit_height <-0.01jit_width <-0.05jit_alpha <-0.6# colors for age groupsgroupcolors <-c("#7570b3", "#1b9e77", "#d95f02")choice3_colors <-c("#e7298a", "#66a61e", "#e6ab02")
Complementing the behavioral analyses, we study exploration and exploitation in PD through the lens of a computational model, the Gaussian Process Upper Confidence Bound (GP-UCB) model. This model integrates similarity-based generalization with two distinct exploration mechanisms: directed exploration, which seeks to reduce uncertainty about rewards, and random exploration, which adds stochastic noise to the search process without being directed towards a particular goal (Wu et al., 2018; Wu et al., 2025). In previous research using the same paradigm, this model has provided the best account of human behavior and enabled the decomposition of exploration into distinct mechanisms (Giron et al., 2023; Meder et al., 2021; Schulz et al., 2019; Wu et al., 2018; Wu et al., 2020).
1.1 Gaussian Process Upper Confidence Bound (GP-UCB) Model
The GP-UCB model comprises three components:
a learning model, which uses Bayesian inference to generate predictions about the rewards associated with each option (tile),
a sampling strategy, which uses reward expectations and associated uncertainty to evaluate how promising each option is, and
a choice rule, which converts options’ values into choice probabilities.
Note
Add details
1.1.1 Learning Model
1.1.2 Sampling Strategy
1.1.3 Choice rule
1.1.4 Model parameters
Associated with each model component is a free parameter that we estimate through out-of-sample cross validation. These parameters provide a window into distinct aspects of learning and exploration:
The length-scale parameter \(\lambda\) of the RBF kernel captures how strongly a participant generalizes based on the observed evidence, i.e., the rewards obtained from previous choices.
The uncertainty bonus \(\beta\) represents to the level of directed exploration, i.e., how much expected rewards are inflated through an “uncertainty bonus”.
The temperature parameter \(\tau\) corresponds to the amount of sampling noise, i.e., extent of random exploration.
2 Model comparison
We tested the GP-UCB model in its ability to model learning and predicting each participants’ search and decision-making behavior. To assess the contribution of each component of the model (generalization, uncertainty-directed exploration, and random exploration) we compare the predictive accuracy of the GP-UCB model to model variants where we lesion away each component.
\(\lambda\) lesion model: This model removes the ability to generalize, meaning that all options are learned independently (via Bayesian mean tracker)
\(\beta\) lesion model: No uncertainty-directed exploration (\(\beta=0\)), i.e., options are valued solely based on reward expectations (mean greedy)
\(\tau\) lesion model: Exchanges the softmax choice rule with an \(\epsilon\)-greedy policy as an alternative random exploration mechanism. With probability \(\epsilon\), a random option is selected (each with probability 1/64); with probability 1 − \(\epsilon\), the option with the highest UCB value is chosen. The parameter \(\epsilon\) is estimated for each participant.
All models were fitted using leave-one-round-out cross-validation based on maximum likelihood estimation. Model fits are evaluated using the sum of negative log-likelihoods across all out-of-sample predictions.
Models’ predictive accuracy was assessed using a pseudo-\(R^2\) measure, based on the sum of negative log-likelihoods across all out-of-sample predictions. The summed log loss is compared to a random model, such that \(R^2=0\) corresponds to chance performance and \(R^2=1\) corresponds to theoretically perfect predictions.
Figure 1: Predictive accuracy of GP-UCB model and lesioned variants.
2.1 Model comparison: Control
GP-UCB vs. lambda lesion: \(t(33)=3.0\), \(p=.005\), \(d=0.2\), \(BF=7.5\)
GP-UCB vs. beta lesion: \(t(33)=3.4\), \(p=.002\), \(d=0.2\), \(BF=19\)
GP-UCB vs. tau lesion: \(t(33)=7.7\), \(p<.001\), \(d=0.6\), \(BF>100\)
2.2 Model comparison: PD on
GP-UCB vs. lambda lesion: \(t(32)=3.4\), \(p=.002\), \(d=0.4\), \(BF=20\)
GP-UCB vs. beta lesion: \(t(32)=3.7\), \(p<.001\), \(d=0.4\), \(BF=40\)
GP-UCB vs. tau lesion: \(t(32)=8.5\), \(p<.001\), \(d=0.9\), \(BF>100\)
2.3 Model comparison: PD off
GP-UCB vs. lambda lesion: \(t(30)=3.6\), \(p=.001\), \(d=0.7\), \(BF=27\)
GP-UCB vs. beta lesion: \(t(30)=5.4\), \(p<.001\), \(d=1.1\), \(BF>100\)
GP-UCB vs. tau lesion: \(t(30)=4.9\), \(p<.001\), \(d=1.0\), \(BF>100\)
2.4 Model-based classification of participants
Code
# classify participants according to model R^2df_participant_classification <- modelFits %>%group_by(id) %>%slice_max(order_by = R2, n =1) %>%select(id, group, ModelName, shortname, R2) %>%ungroup() %>%rename(best_ModelName = ModelName,best_shortname = shortname,best_R2 = R2)df_counts <- df_participant_classification %>%count(group, best_shortname)df_percent <- df_counts %>%group_by(group) %>%mutate(total_in_group =sum(n),percent =round((n / total_in_group) *100, 1) ) %>%ungroup()# add most predictive model for each subject to df modelFitsmodelFits <- modelFits %>%left_join(df_participant_classification, by =c("id", "group"))
We classified participants based on which model achieved the highest cross-validated predictive accuracy (highest \(R^2\); ?@fig-participant_classification). In each patient group, the GP-UCB model was the most predictive model for the majority of participants (Control: 55.9%, PD on: 57.6%, PD off: 58.1%).
In total, out of 98 participants, 56 (57.1%) were best described by the GP-UCB model, 22 (22.4%) by the lambda lesion model, 13 (13.3%) by the beta lesion model, and 7 (7.1%) by the tau lesion model. The results suggest that all three components of the GP-UCB model are relevant for predicting participants’ behavior.
To better understand the mechanisms underlying the observed behavioral differences, we analyzed the parameters of the Gaussian Process Upper Confidence Bound (GP-UCB) model (Figure 2).
3.0.1 Generalization \(\lambda\)
The parameter \(\lambda\) represents the length-scale in the RBF kernel, which governs the amount of generalization, i.e., to what extent participants assume a spatial correlation between options (higher \(\lambda\) = stronger generalization). Overall, the amount of generalization was very similar between groups.
Control vs. PD on: \(U=678\), \(p=.145\), \(r_{ au}=.15\), \(BF=.65\)
Control vs. PD off: \(U=731\), \(p=.007\), \(r_{ au}=.28\), \(BF=3.9\)
PD on vs. PD off: \(U=626\), \(p=.126\), \(r_{ au}=.16\), \(BF=.44\)
3.0.2 Exploration bonus \(\beta\)
The parameter \(\beta\) represents the uncertainty bonus, i.e. how much expected rewards are positively inflated by their uncertainty (higher \(\beta\) = more uncertainty-directed exploration). Controls and PD on patients on medication did not differ, and both groups had lower beta estimates than the dopamine-depleted patients in the PD− group. These differences suggest that levodopa medication modulated the amount of uncertainty-directed exploration by restoring beta to levels comparable to those observed in controls without PD. This aligns with findings from a restless bandit paradigm, where L-Dopa reduced the amount of directed exploration in healthy volunteers, while the level of random exploration remained unaffected (Chakroun et al., 2020).
Control vs. PD on: \(U=480\), \(p=.315\), \(r_{ au}=-.10\), \(BF=.48\)
Control vs. PD off: \(U=188\), \(p<.001\), \(r_{ au}=-.46\), \(BF=81\)
PD on vs. PD off: \(U=220\), \(p<.001\), \(r_{ au}=-.41\), \(BF=25\)
3.0.3 Random exploration \(\tau\)
The parameter \(\tau\) represents the amount of decision noise, i.e. stochastic variability in the softmax decision rule (lower \(\tau\) = more decision noise, i.e. more uniform distribution; conversely, \(\tau \rightarrow \infty \quad \Rightarrow \quad \text{argmax (greedy)}\)). There were no group differences in rge temperature paramter \(\tau\), indicating comparable amounts of random exploration regardless of group.
Control vs. PD on: \(U=572\), \(p=.896\), \(r_{ au}=.01\), \(BF=.25\)
Control vs. PD off: \(U=500\), \(p=.730\), \(r_{ au}=-.04\), \(BF=.27\)
PD on vs. PD off: \(U=470\), \(p=.584\), \(r_{ au}=-.06\), \(BF=.28\)
4 Relations of model parameters to performance
We assessed the correlation (Kendall’s tau, because it’s invariant against log transformation) of GP-UCB parameter estimates with performance (mean reward).
Code
# mean reward per subject across all trials and rounds (practice and bonus round excluded)df_mean_reward_subject <- dat %>%filter(trial !=0& round %in%2:9) %>%# exclude first (randomly revealed) tile and practice round and bonus roundgroup_by(id) %>%summarise(group =first(group),sum_reward =sum(z),mean_reward =mean(z), sd_reward =sd(z)) df_params_performance <- df_gpucb_params %>%left_join(df_mean_reward_subject, by =c("id", "group"))df_params_performance_wide <- df_gpucb_params %>%pivot_wider(names_from = param, values_from = estimate ) %>%left_join(df_mean_reward_subject, by =c("id", "group"))
The amount of generalization was positively related with obtained rewards, showing that participants who successfully learned about the spatially correlation of rewards performed better. The uncertainty bonus \(\beta\) was negatively correlated with performance, demonstrating that an overreliance on exploration impairs efficient reward accumulation. The amount of random temperature \(\tau\) was not related to obtained rewards.
Figure 3: Correlation of GP-UCB parameters with obtained mean reward across all trials and rounds. Each dot is one participant. The insets show the correlations for a restricted parameter range from 0 to 1.
4.1 Generalization \(\lambda\)
Overall, the extent of generalization was positively related to performance, suggesting that participants who stronger generalized obtained more rewards:
Overall: \(r_{ au}=.26\), \(p<.001\), \(BF>100\)
Analysis of parameter estimates on the group level showed that this overall relation was primarily driven by PD on patients, who showed a strong relation, whereas there was no relation in controls or PD off patients:
Control: \(r_{ au}=.13\), \(p=.288\), \(BF=.39\)
PD on: \(r_{ au}=.45\), \(p<.001\), \(BF>100\)
PD off: \(r_{ au}=-.01\), \(p=.973\), \(BF=.23\)
4.2 Exploration bonus \(\beta\)
The exploration bonus \(\beta\) driving uncertainty-directed correlation was negatively related to performance, suggesting that participants who explore too much at the cost of exploiting known high-value options achieve lower performance:
Overall: \(r_{ au}=-.59\), \(p<.001\), \(BF>100\)
Analysis of parameter estimates on the group level showed that this overall relation was primarily driven by PD on patients, who showed a strong relation, whereas there was no relation in controls or PD off patients:
Control: \(r_{ au}=-.43\), \(p<.001\), \(BF>100\)
PD on: \(r_{ au}=-.61\), \(p<.001\), \(BF>100\)
PD off: \(r_{ au}=-.60\), \(p<.001\), \(BF>100\)
4.3 Random exploration \(\tau\)
The temperature parameter of the softmax choice rule \(\tau\), representig random exploration, was not related to performance, suggesting that participants who explore too much at the cost of exploiting known high-value options achieve lower performance:
Overall: \(r_{ au}=-.07\), \(p=.308\), \(BF=.22\)
Analysis of parameter estimates on the group level showed that this overall relation was primarily driven by PD on patients, who showed a strong relation, whereas there was no relation in controls or PD off patients:
Control: \(r_{ au}=-.02\), \(p=.860\), \(BF=.23\)
PD on: \(r_{ au}=.09\), \(p=.451\), \(BF=.30\)
PD off: \(r_{ au}=-.23\), \(p=.077\), \(BF=1.1\)
4.3.1 Further analyses
take ratio of explore/exploit and correlate with model params
6 Model params of participants best explained by GP-UCB model
For this analysis we only consider participants who were best explained by the GP-UCB model. The results are consistent with the same analyses performed with the full sample above: no substantial differences in amount of generalization \(\lambda\), marked differences in terms of the exploration bonus \(\beta\), and no differences in terms of random exploration \(\tau\). The only difference is that we found a difference between the control and off-medication group in the extent of generalization when using the full sample, whereas we found no difference when only considering the subset of participants best accounted for by the GP-UCB model.
6.0.1 Generalization \(\lambda\)
The parameter \(\lambda\) represents the length-scale in the RBF kernel, which governs the amount of generalization, i.e., to what extent participants assume a spatial correlation between options (higher \(\lambda\) = stronger generalization). Overall, the amount of generalization was very similar between groups.
Control vs. PD on: \(U=210\), \(p=.402\), \(r_{ au}=.12\), \(BF=.44\)
Control vs. PD off: \(U=219\), \(p=.150\), \(r_{ au}=.20\), \(BF=.63\)
PD on vs. PD off: \(U=191\), \(p=.558\), \(r_{ au}=.08\), \(BF=.34\)
6.0.2 Exploration bonus \(\beta\)
The parameter \(\beta\) represents the uncertainty bonus, i.e. how much expected rewards are positively inflated by their uncertainty (higher \(\beta\) = more uncertainty-directed exploration). Controls and PD on patients on medication did not differ, and both groups had lower beta estimates than the dopamine-depleted patients in the PD− group. These differences suggest that levodopa medication modulated the amount of uncertainty-directed exploration by restoring beta to levels comparable to those observed in controls without PD. This aligns with findings from a restless bandit paradigm, where L-Dopa reduced the amount of directed exploration in healthy volunteers, while the level of random exploration remained unaffected (Chakroun et al., 2020).
Control vs. PD on: \(U=182\), \(p=.977\), \(r_{ au}=.01\), \(BF=.32\)
Control vs. PD off: \(U=55\), \(p<.001\), \(r_{ au}=-.49\), \(BF=14\)
PD on vs. PD off: \(U=54\), \(p<.001\), \(r_{ au}=-.49\), \(BF=31\)
6.0.3 Random exploration \(\tau\)
The parameter \(\tau\) represents the amount of decision noise, i.e. stochastic variability in the softmax decision rule (lower \(\tau\) = more decision noise, i.e. more uniform distribution; conversely, \(\tau \rightarrow \infty \quad \Rightarrow \quad \text{argmax (greedy)}\)). There were no group differences in rge temperature paramter \(\tau\), indicating comparable amounts of random exploration regardless of group.
Control vs. PD on: \(U=193\), \(p=.729\), \(r_{ au}=.05\), \(BF=.35\)
Control vs. PD off: \(U=162\), \(p=.799\), \(r_{ au}=-.04\), \(BF=.33\)
PD on vs. PD off: \(U=140\), \(p=.358\), \(r_{ au}=-.13\), \(BF=.44\)
Figure 5: Parameter estimates of GP-UCB model, estimated through leave-one-round-out cross validation. Each dot is one participant.Only participants are included who were best described by the GP-UCB model.
Chakroun, K., Mathar, D., Wiehler, A., Ganzer, F., & Peters, J. (2020). Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making. Elife, 9, e51260.
Giron, A. P., Ciranka, S., Schulz, E., Bos, W. van den, Ruggeri, A., Meder, B., & Wu, C. M. (2023). Developmental changes in exploration resemble stochastic optimization. Nature Human Behaviour, 7(11), 1955–1967. https://doi.org/https://doi.org/10.1038/s41562-023-01662-1
Meder, B., Wu, C. M., Schulz, E., & Ruggeri, A. (2021). Development of directed and random exploration in children. Developmental Science, 24(4), e13095. https://doi.org/https://doi.org/10.1111/desc.13095
Schulz, E., Wu, C. M., Ruggeri, A., & Meder, B. (2019). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, 30(11), 1561–1572. https://doi.org/10.1177/0956797619863663
Wu, C. M., Meder, B., & Schulz, E. (2025). Unifying principles of generalization: Past, present, and future. Annual Review of Psychology, 76, 275–302. https://doi.org/https://doi.org/10.1146/annurev-psych-021524-110810
Wu, C. M., Schulz, E., Garvert, M. M., Meder, B., & Schuck, N. W. (2020). Similarities and differences in spatial and non-spatial cognitive maps. PLOS Computational Biology, 16(9), e1008149. https://doi.org/10.1371/journal.pcbi.1008149
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behaviour, 2, 915–924. https://doi.org/10.1038/s41562-018-0467-4